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Abstract. In this paper, we study the Learning With Errors problem and its binary variant, where 
secrets and errors are binary or taken in a small interval. We introduce a new variant of the Blum, 
Kalai and Wasserman algorithm, relying on a quantization step that generalizes and fine-tunes modulus 
switching. In general this new technique yields a significant gain in the constant in front of the exponent 
in the overall complexity. We illustrate this by solvin g wit hin half a day a LWE instance with dimension 
n = 128, modulus q = , Gaussian noise a = 1 /( -^n/Trlog^ n) and binary secret, using 2^® samples, 

while the previous best result based on BKW claims a time complexity of 2^"^ with 2®° samples for the 
same parameters. 

We then introduce variants of BDD, GapSVP and UniqueSVP, where the target point is required to lie 
in the fundamental parallelepiped, and show how the previous algorithm is able to solve these variants 
in subexponential time. Moreover, we also show how the previous algorithm can be used to solve the 
BinaryLWE problem with n samples in sub exponential time 2 d"^A+o{i))'i/iogiog™ This analysis does 
not require any heuristic assumption, contrary to other algebraic approaches; instead, it uses a variant 
of an idea by Lyubashevsky to generate many samples from a small number of samples. This makes 
it possible to asymptotically and heuristically break the NTRU cryptosystem in subexponential time 
(without contradicting its security assumption). We are also able to solve subset sum problems in 
subexponential time for density o(l), which is of independent interest: for such density, the previous 
best algorithm requires exponential time. As a direct application, we can solve in subexponential time 
the parameters of a cryptosystem based on this problem proposed at TCC 2010. 


1 Introduction 

The Learning With Errors (LWE) problem has been an important problem in cryptography since its in¬ 
troduction by Regev in [46]. Many cryptosystems have been proven secure assuming the hardness of this 
problem, including Fully Homomorphic Encryption schemes [21,14]. The decision version of the problem can 
be described as follows: given m samples of the form (a, b) G x Zq, where a are uniformy distributed in 

(Zq)", distinguish whether b is uniformly chosen in Zq or is equal to (a, s) -be for a fixed secret s G (Z^)” and 
e a noise value in Zq chosen according to some probability distribution. Typically, the noise is sampled from 
some distribution concentrated on small numbers, such as a discrete Gaussian distribution with standard 
deviation aq for a = o(l). In the search version of the problem, the goal is to recover s given the promise that 
the sample instances come from the latter distribution. Initially, Regev showed that if aq > solving 

LWE on average is at least as hard as approximating lattice problems in the worst case to within Oinja) 
factors with a quantum algorithm. Peikert shows a classical reduction when the modulus is large g > 2” 
in [44]. Finally, in [13], Brakerski el al. prove that solving LWE instances with polynomial-size modulus in 
polynomial time implies an efficient solution to GapSVP. 

There are basically three approaches to solving LWE: the first relies on lattice reduction techniques such 
as the LLL [32] algorithm and further improvements [15] as exposed in [34,35]; the second uses combinatorial 
techniques [12,47]; and the third uses algebraic techniques [9]. According to Regev in [1], the best known 
algorithm to solve LWE is the algorithm by Blum, Kalai and Wasserman in [12], originally proposed to solve 
the Learning Parities with Noise (LPN) problem, which can be viewed as a special case of LWE where q = 2. 
The time and memory requirements of this algorithm are both exponential for LWE and subexponential for 


LPN in During the first stage of the algorithm, the dimension of a is reduced, at the cost of a 

(controlled) decrease of the bias of b. During the second stage, the algorithm distinguishes between LWE and 
uniform by evaluating the bias. 

Since the introduction of LWE, some variants of the problem have been proposed in order to build more 
efficient cryptosystems. Some of the most interesting variants are Ring-LWE by Lyubashevsky, Peikert and 
Regev in [38], which aims to reduce the space of the public key using cyclic samples; and the cryptosystem 
by Dottling and Miiller-Quade [18], which uses short secret and error. In 2013, Micciancio and Peikert [40] 
as well as Brakerski et al. [13] proposed a binary version of the LWE problem and obtained a hardness result. 

Related Work. Albrecht et al. have presented an analysis of the BKW algorithm as applied to LWE 
in [4,5]. It has been recently revisited by Due et al, who use a multi-dimensional FFT in the second stage of 
the algorithm [19]. However, the main bottleneck is the first BKW step and since the proposed algorithms 
do not improve this stage, the overall asymptotic complexity is unchanged. 

In the case of the BinaryLWE variant, where the error and secret are binary (or sufficiently small), 
Micciancio and Peikert show that solving this problem using m = n(l + n(l/log(n))) samples is at least 
as hard as approximating lattice problems in the worst case in dimension 0(n/log(n)) with approximation 
factor 0{^Jnq). We show in section B that existing lattice reduction techniques require exponential time. 
Arora and Ge describe a -time algorithm when q > n to solve the LWE problem [9]. This leads to a 

subexponential time algorithm when the error magnitude aq is less than ^/n. The idea is to transform this 
system into a noise-free polynomial system and then use root finding algorithms for multivariate polynomials 
to solve it, using either relinearization in [9] or Grobner basis in [3]. In this last work, Albrecht et al. present an 

(cj + o(1))ti. log log log n 

algorithm whose time complexity is 2 s log log « when the number of samples m= (l-|-o(l))n log log n 
is super-linear, where u < 2.3728 is the linear algebra constant, under some assumption on the regularity of 
the polynomial system of equations; and when m = 0{n), the complexity becomes exponential. 

Contribution. Our first contribution is to present in a unified framework the BKW algorithm and all 
its previous improvements in the binary case [33,28,11,25] and in the general case [5]. We introduce a new 
quantization step, which generalizes modulus switching [5]. This yields a significant decrease in the constant 
of the exponential of the complexity for LWE. Moreover our proof does not require Gaussian noise, and does 
not rely on unproven independence assumptions. Our algorithm is also able to tackle problems with larger 
noise. 

We then introduce generalizations of the BDD, GapSVP and UniqueSVP problems, and prove a reduction 
from these variants to LWE. When particular parameters are set, these variants impose that the lattice point 
of interest (the point of the lattice that the problem essentially asks to locate: for instance, in the case of 
BDD, the point of the lattice closest to the target point) lie in the fundamental parallelepiped; or more 
generally, we ask that the coordinates of this point relative to the basis defined by the input matrix A has 
small infinity norm, bounded by some value B. For small B, our main algorithm yields a subexponential-time 
algorithm for these variants of BDD, GapSVP and UniqueSVP. 

Through a reduction to our variant of BDD, we are then able to solve the subset-sum problem in subexpo¬ 
nential time when the density is o(l), and in time jf density is 0{1/ logn). This is of 

independent interest, as existing techniques for density o(l), based on lattice reduction, require exponential 
time. As a consequence, the cryptosystems of Lyubashevsky, Palacio and Segev at TGG 2010 [37] can be 
solved in subexponential time. 

As another application of our main algorithm, we show that BinaryLWE with reasonable noise can be 
solved in time instead of 2^^”^; and the same complexity holds for secret of size up to 

2108°''’”. As a consequence, we can heuristically recover the secret polynomials f,g of the NTRU problem 
in subexponential time 2 d'i 2 / 2 -i-o(i))n/iogiogn (.^^jthout contradicting its security assumption). The heuristic 
assumption comes from the fact that NTRU samples are not random, since they are rotations of each other: 
the heuristic assumption is that this does not significantly hinder BKW-type algorithms. Note that there is a 
large value hidden in the o(l) term, so that our algorithm does not yield practical attacks for recommended 
NTRU parameters. 
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2 Preliminaries 


We identify any element of Z/qZ to the smallest of its equivalence class, the positive one in case of tie. 
Any vector x G (Z/qZ)" has an Euclidean norm ||x|| = ^ind ||x||oo = maxi|xi|. A matrix B can 

be Gram-Schmidt orthogonalized in B, and its norm ||B|| is the maximum of the norm of its columns. We 
denote by (x|y) the vector obtained as the concatenation of vectors x,y. Let I be the identity matrix and 
we denote by In the neperian logarithm and log the binary logarithm. A lattice is the set of all integer linear 
combinations A(bi,..., b„) = t>i ' (where Xi G Z) of a set of linearly independent vectors bi,..., b„ 
called the basis of the lattice. If B = [bi,... ,b„] is the matrix basis, lattice vectors can be written as Bx for 
X G Z". Its dual A* is the set of x G K" such that (x. A) C Z". We have A** = A. We borrow Bleichenbacher’s 
definition of bias [42] . 

Definition 1. The bias of a prohahility distribution (j) over'LjeiL is 

Ex~,/.[exp(2i7ra;/g)]. 

This definition extends the usual definition of the bias of a coin in Z/2Z: it preserves the fact that any 
distribution with bias b can be distinguished from uniform with constant probability using f2(l/6^) samples, 
as a consequence of Hoeffding’s inequality; moreover the bias of the sum of two independent variable is still 
the product of their biases. We also have the following simple lemma: 

Lemma 1. The bias of the Gaussian distribution of mean 0 and standard deviation qa is exp(—27r^a^). 

Proof. The bias is the value of the Fourier transform at — 1/g. □ 

We introduce a non standard definition for the LWE problem. However as a consequence of Lemma 1, 
this new definition naturally extends the usual Gaussian case (as well as its standard extensions such as the 
bounded noise variant [13, Definition 2.14]), and it will prove easier to work with. The reader can consider 
the distorsion parameter e = 0 as it is the case in other papers and a gaussian of standard deviation aq. 

Definition 2. Let n > 0 and q > 2 be integers. Given parameters a and e, the LWE distribution is, for 
s G ifLjq'Ly^, a distribution on pairs {B.,b) G {fLIqlLf^ x (M/gZ) such that a is sampled uniformly, and for 
all EL, 

|E[exp(2z7r((a, s) — b)/q)\EL] exp^a'"^) — 1| < e 

for some universal a' < a. 

For convenience, we define P = \prij2la.. In the remainder, a is called the noise parameter^, and e the 
distortion parameter. Also, we say that a LWE distribution has a noise distribution (j) if b is distributed as 
(a,s) + (f. 

Definition 3. The Decision-LWE problem is to distinguish a LWE distribution from the uniform distribution 
over {EL,b). The Search-LWE problem is, given samples from a LWE distribution, to find s. 

Definition 4. The real Xi is the radius of the smallest ball, centered in 0, such that it contains i vectors of 
the lattice A which are linearly independent. 

We define Ps(x) = exp(—7r||x||^/s^) and Ps{S) = X]xes/^s(^) (^'^‘^ similarly for other functions). The 
discrete Gaussian distribution De,s over a set E and of parameter s is such that the probability of De,s{^) 
of drawing x G A is equal to ps(x)/ps{E). To simplify notation, we will denote by De the distribution De,i. 

Definition 5. The smoothing parameter of the lattice A is the smallest s such that pi/s{A*) = 1 + e. 

Now, we will generalize the BDD, UniqueSVP and GapSVP problems by using another parameter B that 
bounds the target lattice vector. For H = 2", we recover the usual definitions if the input matrix is reduced. 


^ Remark that it differs by a constant factor from other authors’ definition of a. 
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Definition 6. The (resp. BDD^'^^) problem is, given a basis A of the lattice A, and a point x 

such that ||As — x|| < Ai//3 < Ai/2 and ||s||oo < B {resp. ||s|| < B), to find s. 

Definition 7. The UniqueSVP'^'^” {resp. UniqueSVPg'^) problem is, given a basis A of the lattice A, such 
that A 2 /A 1 > fi and there exists s such that ||As|| = Ai with ||s||oo < B {resp. ||s|| < B), to find s. 

Definition 8 . The GapSVP^''^°° {resp. GapSVP^'^^) problem is, given a basis A of the lattice A to distinguish 
between Ai(A) > fi and if there exists s 7 ^ 0 such that ||s||oo < B {resp. ||s|| < B) and ||As|| < 1. 

Definition 9. Given two probability distributions P and Q on a finite set S, the Kullback-Leibler (or KLj 
divergence between P and Q is 

Dk\.{P\\Q) = In 1 ) )P(a;) with ln(a;/0) = +00 if x > 0. 

The following two lemmata are proven in [45] : 

Lemma 2. Let P and Q be two distributions over S, such that for all x, \P{x) — Q{x)\ < S{x)P{x) with 
S{^) ^ 1/4- Then : 

Dkl{P\\Q)<2Y,S{x)^P{x)- 

xGS 

Lemma 3. Let A be an algorithm which takes as input m samples of S and outputs a bit. Let x (resp. y) 
be the probability that it returns 1 when the input is sampled from P (resp. Q). Then : 

\x-y\< A/mL)KL(-P||Q)/2. 

Finally, we say that an algorithm has a negligible probability of failure if its probability of failure is 
2 ”n(n) 4 

2.1 Secret-Error Switching 

At a small cost in samples, it is possible to reduce any LWE distribution with noise distribution to an 
instance where the secret follows the rounded distribution, defined as [0] [7,13]. 

Theorem 1. Given an oracle that solves LWE with m samples in time t with the secret coming from the 
rounded error distribution, it is possible to solve LWE with m + 0{nloglogq) samples with the same error 
distribution (and any distribution on the secret) in time t + 0{mn^ + {nloglogq)^), with negligible probability 
of failure. 

Furthermore, if q is prime, we lose n + k samples with probability of failure bounded by q~^~^. 

Proof. First, select an invertible matrix A from the vectorial part of 0{n log log q) samples in time 0{{n log log q)^) 
[13, Claim 2.13]. 

Let b be the corresponding rounded noisy dot products. Let s be the LWE secret and e such that 
As + e = b. Then the subsequent m samples are transformed in the following way. For each new sample 
(a', b') with b' = (a',s) + e', we give the sample (—*A~^a', 6' — (‘A^^a',b)) to our LWE oracle. 

Clearly, the vectorial part of the new samples remains uniform and since 

b' — b) = (—* A~^a', b — As) + b' — (a', s) = (—‘A^^a', e) + e' 

the new errors follow the same distribution as the original, and the new secret is e. Hence the oracle outputs 
e in time t, and we can recover s as s = A~^(b — e). 

If q is prime, the probability that the n + k first samples are in some hyperplane is bounded by 

qn-lq-n-k ^ q-i-k_ □ 

^ Some authors use another definition. 
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2.2 Low dimension algorithms 


Our main algorithm will return samples from a LWE distribution, while the bias decreases. We describe two 
fast algorithms when the dimension is small enough. 

Theorem 2. //n = 0 and m = k/b"^, with b smaller than the real part of the bias, the Decision-LWE problem 
can be solved with advantage 1 — in time 0{m). 

Proof. The algorithm Distinguish computes x = ^ cos{2iTTbi/q) and returns the boolean x > 6/2. If 

we have a uniform distribution then the average of x is 0, else it is larger than 6/2. The Hoeffding inequality 
shows that the probability of \x — E[a;]| > 6/2 is 2“*/®, which gives the result. □ 


Algorithm 1 FindSecret 
function FlNDSECRET(d) 
for all (a, h) £ C do 

/[a] ^ /[a] + exp(2i7r6/<j) 

end for 

t ■£- FastFourierTransform(/) 
return argmax,,g( 2 /g 2 )„ K(t[s]) 

end function 


Lemma 4. For all s yf 0, if a is sampled uniformly, E[exp(2i7r(a,s)/(jf)] = 0. 

Proof. Multiplication by Sq in is gcd(so, < 7 )-to-one because it is a group morphism, therefore OoSq is 
uniform over gcd{so,q)'Eq. Thus, by using k = gcd{q, sq, ..., Sn-i) < q, (ajs) is distributed uniformly over 

kZq SO 

g/fc-1 

E[exp(2z7r(a,s)/(7)] = ^ exp(2z7r/fc/g) = 0. □ 

rC 

i=o 

Theorem 3. The algorithm FindSecret, when given m > (8nlogg+ fc)/6^ samples from a LWE problem 
with bias whose real part is superior to 6 returns the correct secret in time 0{m + nlog^(g)g") except with 
probability . 

Proof. The fast Fourier transform needs 0{nq^) operations on numbers of bit size (!l(log(( 7 )). The Hoeffding 
inequality shows that the difference between t[s'] and E[exp(2z7r(6 — (a,s'))/g)] is at most 6/2 except with 
probability at most 2exp(—m6^/2). It holds for all s' except with probability at most 2(7" exp(— to6^/2) = 
2 -^(k) using the union bound. Then t[s] > b — 6/2 = 6/2 and for all s' yf s, t[s'] < bj^ so the algorithm 
returns s. □ 


3 Main algorithm 

In this section, we present our main algorithm, prove its asymptotical complexity, and present practical 
results in dimension n = 128. 
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3.1 Rationale 


A natural idea in order to distinguish between an instance of LWE (or LPN) and a uniform distribution is 
to select some k samples that add up to zero, yielding a new sample of the form (0, e). It is then enough to 
distinguish between e and a uniform variable. However, if <5 is the bias of the error in the original samples, 
the new error e has bias <5^, hence roughly samples are necessary to distinguish it from uniform. Thus 
it is crucial that k be as small a possible. 

The idea of the algorithm by Blum, Kalai and Wasserman BKW is to perform “blockwise” Gaussian 
elimination. The n coordinates are divided into k blocks of length b = n/k. Then, samples that are equal 
on the first b coordinates are substracted together to produce new samples that are zero on the first block. 
This process is iterated over each consecutive block. Eventually samples of the form (0, e) are obtained. 

Each of these samples ultimately results from the addition of 2^ starting samples, so k should be at most 
(!I(log(n)) for the algorithm to make sense. On the other hand data are clearly required at each step 

in order to generate enough collisions on b consecutive coordinates of a block. This naturally results in a 
complexity roughly in the original algorithm for LPN. This algorithm was later adapted to 

LWE in [4], and then improved in [5]. 

The idea of the latter improvement is to use so-called “lazy modulus switching”. Instead of finding two 
vectors that are equal on a given block in order to generate a new vector that is zero on the block, one 
uses vectors that are merely close to each other. This may be seen as performing addition modulo p instead 
of q for some p < q, hy rounding every value x G 'Zq to the value nearest xp/q in Zp. Thus at each step 
of the algorithm, instead of generating vectors that are zero on each block, small vectors are produced. 
This introduces a new “rounding” error term, but essentially reduces the complexity from roughly q^ to p^. 
Balancing the new error term with this decrease in complexity results in a significant improvement. 

However it may be observed that this rounding error is much more costly for the first few blocks than 
the last ones. Indeed samples produced after, say, one iteration step are bound to be added together 2““^ 
times to yield the final samples, resulting in a corresponding blowup of the rounding error. By contrast, later 
terms will undergo less additions. Thus it makes sense to allow for progressively coarser approximations (i.e. 
decreasing the modulus) at each step. On the other hand, to maintain comparable data requirements to find 
collisions on each block, the decrease in modulus is compensated by progressively longer blocks. 

What we propose here is a more general view of the BKW algorithm that allows for this improvement, 
while giving a clear view of the different complexity costs incurred by various choice of parameters. Balancing 
these terms is the key to finding an optimal complexity. We forego the “modulus switching” point of view 
entirely, while retaining its core ideas. The resulting algorithm generalizes several variants of BKW, and will 
be later applied in a variety of settings. 

Also, each time we combine two samples, we never use again these two samples so that the combined 
samples are independent. Previous works used repeatedly one of the two samples, so that independency can 
only be attained by repeating the entire algorithm for each sample needed by the distinguisher, as was done 
in [12]. 


3.2 Quantization 

The goal of quantization is to associate to each point of a center from a small set, such that the expectancy 
of the distance between a point and its center is small. We will then be able to produce small vectors by 
substracting vectors associated to the same center. 

Modulus switching amounts to a simple quantizer which rounds every coordinate to the nearest multiple 
of some constant. Our proven algorithm uses a similar quantizer, except the constant depends on the index 
of the coordinate. 

It is possible to decrease the average distance from a point to its center by a constant factor for large 
moduli [24] , but doing so would complicate our proof without improving the leading term of the complexity. 
When the modulus is small, it might be worthwhile to use error-correcting codes as in [25]. 
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3.3 Main Algorithm 

Let us denote by Lq the set of starting samples, and Ci the sample list after i reduction steps. The numbers 
do = 0 < di <■••< dk = n partition the n coordinates of sample vectors into k buckets. Let D = 
{Do,..., Dk-i) be the vector of quantization coefficients associated to each bucket. 


Algorithm 2 Main resolution 

1 

function REDUCE(£i„,Di,di,di+i) 

2 

^out 0 


3 

t[]^0 


4 

for all (a, 6) £ Lin do 

5 

, (“d 

r = [ — 

D 1 

6 

if t[rl = 

0 then 

7 

t[rle 

- (a- b) 

8 

else 


9 

^out 

■s- Lout :: {(a, b) - t[r]} 

10 

t[r]i 

- 0 

11 

end if 


12 

end for 


13 

return Lout 

14 

end function 


15 

function SOLVE(£o,D,(di)) 

16 

for 0 < i < 

k do 

17 

Li+i <— 

REDUCE(7:i, A, di,di+i) 

18 

end for 


19 

return Distinguish({6 (a, b) G Lk}) 

20 

end function 



In order to allow for a uniform presentation of the BKW algorithm, applicable to different settings, 
we do not assume a specific distribution on the secret. Instead, we assume there exists some known B = 
{Bo, ..., Bn-i) such that ^^{si/Bi)'^ < n. Note that this is in particular true if |si| < Bi. We shall see how 
to adapt this to the standard Gaussian case later on. Without loss of generality, B is non increasing. 

There are a phases in our reduction : in the i-th phase, the coordinates from di to di+i are reduced. We 
define m = |£ol- 

Lemma 5. Solve terminates in time 0{mnlogq). 

Proof. The Reduce algorithm clearly runs in time 0{\C\n\ogq). Moreover, |£i+i| < |>Ci|/2 so that the total 
running time of SOLVE is 0(nlog(7^*^Qm/2*) = 0{mn\ogq). □ 

Lemma 6. Write C'i for the samples of Li where the first di coordinates of each sample vector have been 
truncated. Assume \sj\Di < 0.23q for all di < j < If LI is sampled according to the LWE distribution 
of secret s and noise parameters a and e < 1, then is sampled according to the LWE distribution of the 
truncated secret with parameters: 


+ l —1 

a'^ = 2a^+47r^ ^ {sjDi/q)'^ and e'= 3e. 

j=di 


On the other hand, if Di = 1, then = 2o?. 

Proof. The independence of the outputted samples and the uniformity of their vectorial part are clear. Let 
(a, b) be a sample obtained by substracting two samples from Li. For a! the vectorial part of a sample, define 
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e(a') such that E[exp(2z7r((a', s) — h')/q)\a'] = (1 + e(a')) exp(—a^). By definition of LWE, |e(a')| < e, and 
by independence: 


E[exp(2t7r((a,s) - b)/q)\a] = exp(-2a2)E^/_^//^3[(l + e(a'))(l + e(a"))], 
with + e(a'))(l + e(a"))] - 1| < 3e. 

Thus we computed the noise corresponding to adding two samples of Ci. To get the noise for a sample from 
Ci+i, it remains to truncate coordinates from di to di+i. A straightforward induction on the coordinates 
shows that this noise is : 

di+1—1 

exp(-2a2)E^,_^„^^[(l + g(a'))(l + e(a"))] J| E[exp{2iTrajSj/q)]. 

Indeed, if we denote by a^-l^ the vector a where the first j coordinates are truncated and aj the noise 
parameter of we have: 

|E[exp(2t7r((a^^''‘^\s^'’'''^^) — b)/q)\a^^^^^] — exp(—Q;^)E[exp(2t7rajSj/(7)] | 

= |E[exp(—2i7rajSj7q)(exp(2i7r((a^'^\s^'^^) — b)/q) — exp(—a|))]| 

< e'exp(—Q;|)E[exp(2i7rajSj7(7)]. 

It remains to compute E[exp(2t7rajSj/(7)] for di < j < di+i- Let D = Di. The probability mass function of 
aj is even, so E[exp(2t7rajSj7(7)] is real. Furthermore, since ja^l < D, 

E[exp(2t7rajSj7(7)] > cos{2'k sjD/q). 

Simple function analysis shows that ln(cos(27rx)) > —for |x| < 0.23, and since \sj\D < 0.23(7, we get : 

E[exp(2i7rajSj7(7)] > exp(—47r^s|£)^/(7^). 

On the other hand, if = 1 then aj = 0 and E[exp(2t7rajSj/q')] = 1. □ 

Finding optimal parameters for BKW amounts to balancing various costs: the baseline number of samples 
required so that the final list Lk is non-empty, and the additional factor due to the need to distinguish the 
final error bias. This final bias itself comes both from the blowup of the original error bias by the BKW 
additions, and the “rounding errors” due to quantization. Balancing these costs essentially means solving a 
system. 

For this purpose, it is convenient to set the overall target complexity as for some x to be 

determined. The following auxiliary lemma essentially gives optimal values for the parameters of Solve 
assuming a suitable value of x. The actual value of x will be decided later on. 

Lemma 7. Pick some value x (dependent on LWE parameters). Choose: 


k < 



D, < 


q^/xjQ 

7rBd,2(“-i+i)/2 


m = n2'=2”" 


dij^i = min 



nx 

log(l -L q/D,) 



Assume dk = n and e < and for all i and di < j < di+i, \sj\Di < 0.23q. Solve runs in time 

0{mn) with negligible failure probability. 


Proof. Remark that for all i, 


|A+i| > (lAI - (1 + q(D,f'+^-‘^')l2 > (lAI - 2”7/2. 



Using induction, we then have \Ci\ > (|£o| + — 2"“ so that \Ck\ > n2"'^. 

By induction and using the previous lemma, the input of Distinguish is sampled from a LWE distribution 
with noise parameter: 

a'2 = 2^=02 + 4,^2 ^ ^ {sjD,/qf. 

2—0 j — di 

By choice of k the first term is smaller than nxj^. As for the second term, since B is non increasing and by 
choice of Di, it is smaller than: 


fc— 1 

47r2^2'=-*-i 

2=0 


c/e 


1-1 


71—1 


2+1 


E +) S-/0. 


j=di 


j=0 


Thus the real part of the bias is superior to exp(—nx/3)(l — 3“e) > 2 and hence by Theorem 2.2, 

Distinguish fails with negligible probability. □ 

Theorem 4. Assume that for all i, |si| < B, B > 2, max(/3,log(g)) = 2°("/'°s”), j3 = w(l), and e < 1/+. 
Then Solve takes time 2("/2+°("))/in(n-i°g/3/i°gS)). 


Proof. We apply Lemma 7, choosing 

k = [log(/3^/(12 ln(l + log /3)))J = (2 - o(l)) log jS G w(l) 

and we set Di = q /. It now remains to show that this choice of parameters satisfies the conditions 
of the lemma. 

First, observe that BDi/q < 1/k = o(l) so the condition \sj\Di < 0.23q is fulfilled. Then, dk > n, which 
amounts to: 

E (fc-i)/2 + logO(tB) a 2* ln(l + kft! log OikB)) > 1 + l/n = 1 + „(1) 

If we have logfc = a;(loglogil) (so in particular k = w(logi?)), we get ln(l + k/2/ \ogO{kB)) = (1 + 
o(l)) In(fc) = (1 + o(l)) ln(l + log ,5/log i?). 

Else, logfc = ©(loglogi?) = o{logB) (since necessarily B = w(l) in this case), so we get ln(l + 
k/2/logO{kB)) = (1 + o(l)) ln(l + log/?/logi3). 

Thus our choice of x fits both cases and we have l/x < 21n(l + log,5). Second, we have 1/fc = o(+r) so 
Di, e and k are also sufficiently small and the lemma applies. Finally, note that the algorithm has complexity 
2 i 2 (n/iogn)^ SO a factoi' n2^1og(g) is negligible. □ 


This theorem can be improved when the use of the given parameters yields D < 1, since D = 1 already 
gives a lossless quantization. 

Theorem 5. Assume that for all i, |si| < B = Let j3 = and q = n'^ with d > b and c + b > d. 

Assume e < 1/13'^. Then Solve takes time 2 '^A‘^(c-d+b)/d+ 2 \n(d/b)-o(i)) ^ 


Proof. Once again we aim to apply Lemma 7, and choose k as above: 

k = log(/3^/(121n(l + log/3))) = (2c - o(l)) logn 

If j < \2{c — d+b) logn], we take Di = 1, else we choose q/Di = 0(732^““*)/^). Satisfying da > n — 1 amounts 
to: 


2x{c — d + b) logn/log 9 


E 

i—\2{c—d-\-b) log n] 


(a 


X 

i)/2 + \ogO{B) 


> 2x{c — d+ b)/d+ 2xln((a — 2(c— d+ 6)logn + 21ogi3)/2/log0(i3)) 

> 1 + a/n = 1 + o(l) 


So that we can choose l/x = 2(c — d + b)/d + 21n(d/b) — o(l). 


□ 
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Corollary 1. Given a LWE problem with q = n'^, Gaussian errors with (3 = c > 1/2 and e < n we 
can find a solution in time. 

Proof. Apply Theorem 1 : with probability 2/3, the secret is now bounded by S = 0{q^/n/fi\/logn). The 
previous theorem gives the complexity of an algorithm discovering the secret, using b = 1/2 — c + d, and 
which works with probability 2/3 — Repeating n times with different samples, the correct secret will 

be outputted at least ?t,/ 2 + 1 times, except with negligible probability. By returning the most frequent secret, 
the probability of failure is negligible. □ 

In particular, if c < d, it is possible to quantumly approximate lattice problems within factor [46]. 

Setting c= d, the complexity is 2 "'Ai/c+ 2 in( 2 c)-o(i))^ constant slowly converges to 0 when c goes 

to infinity. 

A simple BKW using the bias would have a complexity of the analysis of [5] or [4] only 

conjectures for c > 1/2. In [5], the authors incorrectly claim a complexity of 2'^"+°(") when 

c = d, because the blowup in the error is not explicitely computed ® . 

Though the upper bound on e is small, provably solving the learning with rounding [?] problem where 
6=1 L|(a, s)] for some p seems out of reach®. 

Finally, if we want to solve the LWE problem for different secrets but with the same vectorial part of the 
samples, it is possible to be much faster if we work with a bigger final bias, since the Reduce part needs to 
be called only once. 


3.4 Some Practical Improvements 

In this section we propose a few heuristic improvements for our main algorithm, which speed it up in practice, 
although they do not change the factor in the exponent of the overall complexity. In our main algorithm, 
after a few iterations of Reduce, each sample is a sum of a few original samples, and these sums are disjoint. 
It follows that samples are independent. This may no longer be true below, and sample independence is lost, 
hence the heuristic aspect. However this has negligible impact in practice. 

First, in the non-binary case, when quantizing a sample to associate it to a center, we can freely quantize 
its opposite as well (i.e. quantize the sample made up of opposite values on each coordinate) as in [4]. 

Second, at each Reduce step, instead of substracting any two samples that are quantized to the same 
center, we could choose samples whose difference is as small as possible, among the list of samples that are 
quantized to the same center. The simplest way to achieve this is to generate the list of all differences, and 
pick the smallest elements (using the L2 norm). We can thus form a new sample list, and we are free to make 
it as long as the original. Thus we get smaller samples overall with no additional data requirement, at the 
cost of losing sample independence.^ 

Analyzing the gain obtained using this tactic is somewhat tricky, but quite important for practical 
optimization. One approach is to model reduced sample coordinates as independent and following the 
same Gaussian distribution. When adding vectors, the coordinates of the sum then also follows a Gaus¬ 
sian distribution. The (squared) norm of a vector with Gaussian coordinates follows the distribution. 
Its cumulative distribution for a /c-dimensional vector is the regularized gamma function, which amounts to 
1 — exp(—a:/2) for even k (assuming standard deviation I for sample coordinates). 

Now suppose we want to keep a fixed proportion of all possible sums. Then using the previous formula 
we can compute the radius R such that the expected number of sample sums falling within the ball of radius 
R is equal to the desired proportion. Thus using this Gaussian distribution model, we are able to predict 
how much our selection technique is able to decrease the norm of the samples. 

® They claim it is possible to have « logn reduction steps while the optimal number is « 2clog(n) so that is loose 
for c > 1/2 and wrong for c < 1/2. 

® [19] claims to do so, but it actually assumes the indepedency of the errors from the vectorial part of the samples. 
^ A similar approach was taken in [5] where the LI norm was used, and where each new sample is reduced with the 
shortest element ever quantized on the same center. 
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Practical experiments show that for a reasonable choice of parameters (namely keeping a proportion 
1/1000 of samples for instance), the standard deviation of sample norms is about twice as high as predicted 
by the Gaussian model for the first few iterations of Reduce, then falls down to around 15% larger. This is 
due to a variety of factors; it is clear that for the first few iterations, sample coordinates do not quite follow 
a Gaussian distribution. Another notable observation is that newly reduced coordinates are typically much 
larger than others. 

While the previous Gaussian model is not completely accurate, the ability to predict the expected norms 
for the optimized algorithm is quite useful to optimize parameters. In fact we could recursively compute 
using dynamic programming what minimal norm is attainable for a given dimension n and iteration count 
a within some fixed data and time complexities. In the end we will gain a constant factor in the final bias. 

In the binary case, we can proceed along a similar line by assuming that coordinates are independent 
and follow a Bernoulli distribution. 

Regarding secret recovery, in practice, it is worthwhile to compute the Fourier transform on a high 
dimension, such that its cost approximately matches that of Solve. On the other hand, for a low enough 
dimension, computing the experimental bias for each possible high probability secret may be faster. 

Another significant improvement in practice can be to apply a linear quantization step just before secret 
recovery. The quantization steps we have considered are linear, in the sense that centers are a lattice. If 
A is a basis of this lattice, in the end we are replacing a sample (a, 6) by (Ax, 6). We get (x, A‘s) + e = 
(Ax,s) + e — b — ((a — Ax),s). Thus the dimension of the Fourier transform is decreased as remarked by 
[25], at the cost of a lower bias. Besides, we no longer recover s but y = A*s. Of course we are free to change 
A and quantize anew to recover more information on s. In some cases, if the secret is small, it may be worth 
simply looking for the small solutions of A‘x = y (which may in general cost an exponential time) and test 
them against available samples. 

In general, the fact that the secret is small can help its recovery through a maximum likelihood test [43]. 
If the secret has been quantized as above however, A will need to be chosen such that As is small (by being 
sparse or having small entries). The probability to get a given secret can then be evaluated by a Monte-Garlo 
approach with reasonable accuracy within negligible time compared to the Fourier transform. 

If the secret has a coordinate with non-zero mean, it should be translated. 

Finally, in the binary case, it can be worthwhile to combine both of the previous algorithms: after 
having reduced vectors, we can assume some coordinates of the secret are zero, and quantize the remainder. 
Depending on the number of available samples, a large number of secret-error switches may be possible. 
Under the assumption that the success probability is independent for each try, this could be another way to 
essentially proceed as if we had a larger amount of data than is actually available. We could thus hope to 
operate on a very limited amount of data (possibly linear in n) for a constant success probability. 

The optimizations above are not believed to change the asymptotic complexity of the algorithm, but have 
a significant impact in practice. 


3.5 Experimentation 

We have implemented our algorithm, in order to test its efficiency in practice, as well as that of the practical 
improvements in subsection 3.4. We have chosen dimension n = 128, modulus q = n^, binary secret, and 
Gaussian errors with noise parameter a = l/(-\/n/7rlog^n). The previous best result for these parameters, 
using a BKW algorithm with lazy modulus switching, claims a time complexity of 2^"* with 2®° samples [5]. 

Using our improved algorithm, we were able to recover the secret using m = 2^® samples within 13 hours 
on a single PG equipped with a 16-core Intel Xeon. The computation time proved to be devoted mostly to 
the computation of 9 • 10^® norms, computed in fixed point over 16 bits in SIMD. The implementation used 
a naive quantizer. 

In section B, we compare the different techniques to solve the LWE problem when the number of samples 
is large or small. We were able to solve the same problem using BKZ with block size 40 followed by an 
enumeration in two minutes. 
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Table 1. Complexity of solving LWE with the Regev parameters [46], i.e. the error distribution is a continuous gaus- 
sian of standard deviation ^ and with modulus q~ . The reasonable column corresponds to multiplying 
the predicted reduction factor at each step by 1.1, and assuming that the quantizers used reduce the variance by 
a factor of 1.3. The corresponding parameters of the algorithm are shown in column k (the number of reduction 
steps), log(m) (m is the list size), and log(A^) (one vector is kept for the next iteration for each N vectors tested). 
The complexities are expressed as logarithm of the number of bit operations of each reduction step. Pessimistic uses 
a multiplier of 2 and a naive quantifier (factor 1). Optimistic uses a multiplier of 1 and an asymptotical quantifier 
(factor 27re/12 « 1.42). The asymptotical complexity is instead of 


n 

q 

k 

log(m) 

\og{N) 

Reasonable 

Optimistic 

Pessimistic 

Previous [19] 

64 

4099 

16 

30 

0 

39.6 

39.6 

40.6 

56.2 

80 

6421 

17 

38 

0 

47.9 

46.0 

48.0 

66.9 

96 

9221 

18 

45 

0 

55.3 

54.3 

56.3 

77.4 

112 

12547 

18 

54 

0 

64.6 

60.6 

65.6 

89.6 

128 

16411 

19 

60 

0 

70.8 

67.8 

72.8 

98.8 

160 

25601 

20 

75 

0 

86.2 

82.2 

88.2 

119.7 

224 

50177 

21 

93 

13 

117.8 

111.8 

121.8 

164.3 

256 

65537 

22 

106 

15 

133.0 

125.0 

137.0 

182.7 

384 

147457 

24 

164 

18 

194.7 

183.7 

201.7 

273.3 

512 

262147 

25 

219 

25 

257.2 

242.2 

266.2 

361.6 


Table 2. Complexity of solving LWE with the Lindner-Peikert parameters [34] but with a number of samples m 
much larger than the cryptosystem provides (2n + 128). The error distribution is 


n 

q 

s 

k 

log(m) 

log(Af) 

Reasonable 

Optimistic 

Pessimistic 

192 

4099 

8.87 

19 

68 

5 

84.2 

79.2 

84.2 

256 

6421 

8.35 

20 

82 

8 

101.7 

95.7 

103.7 

320 

9221 

8.00 

22 

98 

9 

119.0 

112.0 

122.0 


Table 3. Complexity of solving LWE with binary ({0,1}) secret with the Regev parameters [46]. 


n 

q 

k 

log(m) 

log(iV) 

Reasonable 

Optimistic 

Pessimistic 

Previous [5] 

128 

16411 

16 

28 

0 

38.8 

38.8 

39.8 

74.2 

256 

65537 

19 

52 

0 

64.0 

62.0 

67.0 

132.5 

512 

262147 

22 

99 

0 

112.2 

104.2 

117.2 

241.8 
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3.6 Extension to the norm L2 


One can think that in Lemma 6, the condition \sj\Di < 0.2Sq is artificial. In this section, we show how to 
remove it, so that the only condition on the secret is ||s|| < y/nB which is always better. The basic idea is to 
use rejection sampling to transform the coordinates of the vectorial part of the samples into small discrete 
gaussians. The following randomized algorithm works only for some moduli, and is expected to take more 
time, though the asymptotical complexity is the same. 


1: function AcCEPT(a,u,(7,fc) l> For all i, at € 

2: return true with probability exp(—7r(y~], min(u^, (u; + 1)^) — {ui + aifk)‘^)la^) 

3: end function 

4: function REDUCEL2(/li„,Z)i,di,di+i,cri) 

5: t[] ^ 0 

6: for all (a, b) € Cin do 

7 . j. ^ ^ 

8: PuSH(t[r], (a,6)) 

9: end for 

10 : £-out ^ 0 

11: while |{r £ ^ 0 }\ > {q//3 do 

12: Sample x and y according to ^ 

13: repeat 

14: Sample u and v uniformly in 

15: until t[u + x] ^ 0 and t[v + y\ ^ 0 

16: (uo, &o) POP(t[u + a:]) 

17: (ui, &i) ^ Pop(t[r» + y]) 

18: if AcCEPT(ao mod q/Di,u,Gi,q/Di) and AcCEPT(ai mod q/Di,v,Gi,q/Di) then 

19. klout (nQ Ui, b(j .. Hout 

20: end if 

21: end while 

22: return Lout 

23: end function 


Lemma 8. Assume that Di\q, at is larger than some constant and 

\Ci\ > 2nmax(nlog((3'/A)(g/A)‘^’+^“‘^Sexp(5(dj+i -di)/ai)). 

Write C[ for the samples of Li where the first di coordinates of each sample vector have been truncated. If 
L'i is sampled according to the LWE distribution with secret s and noise parameters a and e, then is 

sampled according to the LWE distribution of the truncated secret with parameters: 

di + 1—1 

a'^ = 2a^ + 2 tt ^ (sjaiDi/q)^ and e' = 3e. 

j=di 

On the other hand, if Di = 1, then a'^ = 2a^ . Furthermore, ReduceL2 runs in time (!l(nlogy|£i|) and 
|£i+i| > \Li\ exp(—5(di+i — di)/ai)/6 except with probability 

Proof. On lines 16 and 17, Qq mod q/Di and mod q/Di are uniform and independent. On line 19, because 
of the rejection sampling theorem, we have {q/D.i)uF{ao mod q/Di) and (g/Zli)u + (ai mod q/Di) sampled 
according to D^di^^-d^ Also, the unconditional acceptance probability is, for sufficiently large (Ji, 

exp(-5(cii+i - di)/ai) 
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where the first inequality comes from Poisson summation on both p. Next, the bias added by truncating the 
samples is the bias of the scalar product of {q/Di)u + (qq mod q/Di) — {q/Di)v — (ai mod q/Di) with the 
corresponding secret coordinates. Therefore, using a Poisson summation, we can prove that a' is correct. 

The loop on line 15 is terminated with probability at least 2/3 each time so that the Hoeffding inequality 
proves the complexity. 

On line 10, the Hoeffding inequality proves that minrt[r] > \Ci\/{2{q/ holds except with 
probability Under this assumption, the loop on lines 11-21 is executed at least |/li|/6 times. Then, 

the Hoeffding inequality combined with the bound on the unconditional acceptance probability proves the 
lower bound on |/li+i|. □ 

Theorem 6. Assume that ||s|| < ^/nB, B > 2, max(/3,log(g)) = /3 = Cli(1), and e < 

Then, there exists an integer Q smaller than i?”2^” such that if q is divisible by this integer, we can solve 
Decision-LWE time 

2{n/2+o{n))/ ln(l-|-log/3/ logB) 


Proof. We use ai = Vlog /3, m = 2n^6^ exp(5n/cri)2"^, 

k= [log(/3^/(121n(l-Elog/3)))J = (2 - o(l)) log/3 G w(l) 

and we set Di = qj(Bk2^^~'-^^’^'), di+i = min(di -I- L iog(g/p.) J j n) and Q = Jli We can then see as 

in Theorem 4 that these choices lead to some a; = 1/(2 — o(l))/ln(l -|- log/3/log H). Finally, note that the 
algorithm has complexity so a factor 2n^6^ log((?) exp(5n/•\/log(/3)) is negligible. □ 

The condition on q can be removed using modulus switching [13], unless q is tiny. 

Theorem 7. If Bj3 < q, then the previous theorem holds without the divisibility condition. 

Proof. Let p > g be the smallest modulus such that the previous theorem applies and c = y/np/q. For 
each sample {a,b), sample x from Ilz>*_a/q,<r and use the sample {p/qa + px,p/qb) with the algorithm of 
the previous theorem. Clearly, the vectorial part has independent coordinates, and the probability that one 
coordinate is equal to y is proportional to pc_(p/qh — y). Therefore, the Kullback-Leibler distance with the 
uniform distribution is If the original error distribution has a noise parameter a, then the new error 

distribution has noise parameter a' with 

^ ■nq^sflp^ < + ■nB'^q^/p^ < n/2//3^ -I- irnB^/q^ 

i 

so that p is only reduced by a constant. □ 

4 Applications to Lattice Problems 

We first show that BDDb^^ is easier than LWE for some large enough modulus and then that UniqueSVP^ ^ 
and GapSVP^ ^ are easier than BDDs ^. 


4.1 Variant of Bounding Distance Decoding 

The main result of this subsection is close to the classic reduction of [46]. However, our definition of LWE 
allows to simplify the proof, and gain a constant factor in the decoding radius. The use of the KL divergence 
instead of the statistical distance also allows to gain a constant factor, when we need an exponential number 
of samples, or when A* is really small. 

The core of the reduction lies in Lemma 9, assuming access to a Gaussian sampling oracle. This hypothesis 
will be taken care of in Lemma 10. 
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Lemma 9. Let A be a basis of the lattice A of full rank n. Assume we are given access to an oracle outputting 
a vector sampled under the law cind a > and to an oracle solving the LWE problem in 

dimension n, modulus q > 2, noise parameter a, and distortion parameter f which fails with negligible 
probability and use m vectors if the secret s verifies |si| < Bi. 

Then, if we are given a point x such that there exists s with v = As —x, ||v|| < ^Jlfnaqfa, |si| < Bi and 
/9cr/g(A\{0} + v) < ^exp(—a^)/2, we are able to find s in at mostmn calls to the Gaussian sampling oracle, 
n calls to the LWE solving oracle, with a probability of failure n^/me + and complexity 0{mn^ + n'^) 

for some c. 

Proof. Let y be sampled according to v = As — x. Then 

(y,x) = (y, As) + (y,v) = (AV,s) - (y,v) 

and since y S A*, A*y S Z". We can thus provide the sample (A*y, (y^x)) to the LWE solving oracle. 

The probability of obtaining the sample A‘y G (Z/gZ)" is proportional to PaiqA* +y). Using the Poisson 
summation formula, 

/9„(gA*+ y) = s"det(A/g) ^ exp(2i7r(z,y))pi/<,(z). 

z&A/q 

Since a > g? 7 e(A*) = p,.{qA*), we have 

l-e< cos(27r(z,y))pi/^(z) < 1 + e. 

Therefore, with Lemma 2, the KL-divergence between the distribution of A*y mod q and the uniform distri¬ 
bution is less than 2e^. 

Also, with c G y -L qA*, 

E[exp(2i7r((A*y,s) - (y,x))/g)|A*y mod q] = Jexp(2i7r(y,v/g))]. 

Let /(y) = exp(2z7r(y, v/g))pCT(y) so that the bias is /(gA* -L c)/pcr{qA* -L c). Using the Poisson summation 
formula on both terms, this is equal to 

Pi/<T(y-v/g)exp(-2i7r(y,c))^y'^ Y Pi/<T(y) cos(27r(y. 


'yeA/q 


yeA/q 


In this fraction, the numerator is at distance at most ^exp(—a^)/(l + e) to exp(—7r(T^/g^||v|p) > exp(—a^) 
and the denominator is in [1 — e; 1 -L e]. 

Using Lemma 3 with an algorithm which tests if the returned secret is equal to the real one, the failure 
probability is bounded by yGne. Thus, the LWE solving oracle works and gives s mod g. 

Let x' = (x —A(s mod g))/g so that s' = (s — (s mod g))/g and ||v'|| < ||v||/g. Therefore, the reduction also 
works. If we repeat this process n times, we can solve the last closest vector problem with Babai’s algorithm, 
which reveals s. □ 

In the previous lemma, we required access to a oracle. However, for large enough a, this hypothesis 

comes for free, as shown by the following lemma, which we borrow from [13]. 

Lemma 10. If we have a basis A of the lattice A, then for a > 0{y'log nj |A||), it is possible to sample in 
polynomial time from Df^ „- 

We will also need the following lemma, due to Banaszczyk [10]. 

Lemma 11. For a lattice A, c G K", and t > 1, 
p{(A + c)\B{0,ty^)) 


P(A) 


< exp ( — n{t^ — 2 In t — 1)/2) < exp ( — n{t — 1)^/2). 
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Proof. For any s > 1, using Poisson summation : 

Ps(A + c) ^ „ ExeA* Pi/s{^) exp(2t7r(x,c)) 
p(A) ® p(A*) 

^ n SxgA* Pi/s(^) 

- ExgA.PW 


Then, 


s”p(A) >ps{{A + c)\B{0,t\j — 


>exp(t^(l - l/s^)n/2)p( (A + c) \,B( 0,t\j — 


And therefore, using s = t : 

piiA + c)\BiO,t^)) 
P(A) 


<exp(—n(t^(l — 1/s^) — 21ns)/2) 

= exp(—— 2 In t — l)/2) 

< exp(—n(t — 1)^/2), 


where the last inequality stems from In t < t — 1. 


□ 


Theorem 8. Assume we have a LWE solving oracle of modulus q > 2", parameters /3 and f which needs m 
samples. 

If we have a basis A of the lattice A, and a point x such that As — x = v with ||v|| < (1 — l/n)Ai//3/t < 
Ai/2 and 4exp(—n(t — l/fi — 1)^/2) < ^exp(—n/2//3^), then with calls to the LWE solving oracle with 
secret s, we can find s with probability of failure 2^/rnex.p>{—n{t^ — 21nt — l)/2) for any t > 1 + l/jS. 

Proof. Using Lemma 11, we can prove that a — t^/rij2jTxj\x < » 7 e(A*) for e = 2exp(—n(t^ — 21nt — l)/2) 
and 

Pi/<T (A \ {0} + v) < 2 exp ( - n(t(l - 1/^/t) - 1)^/2). 

Using LLL, we can find a basis B of A so that ||B*|| < 2"/^/Ai, and therefore, it is possible to sample in 
polynomial time from D\ qcr since <? > 2" for sufficiently large n. 

The LLL algorithm also gives a non zero lattice vector of norm I < 2"Ai. For i from 0 to n^, we let 
A = £{1 — l/n)b we use the algorithm of Lemma 9 with standard deviation tqy/nj2pKj\, which uses only 
one call to the LWE solving oracle, and return the closest lattice vector of x in all calls. 

Since £(1 — l/n)" < 2"exp(—n)Ai < Ai, with 0 < * < be the smallest integer such that A = 
£{1 — l/nY < Ai, we have A > (1 — l/n)Ai. Then the lemma applies since 

||v|| < (1 - l/n)\i/j3/t < fiql(tq\Jnl^j-KlX) = Xjtlfi. 


Finally, the distance bound makes As the unique closest lattice point of x. 


□ 


Using self-reduction, it is possible to remove the 1 — 1/n factor [36]. 

Corollary 2 . It is possible to solve in time 2 ("/ 2 +o("))/in(i+iog/3/logS) p — 0^(1), fj = 

and log B = C>(log fi). 

Proof. Apply the previous theorem and Theorem 4 with some sufficiently large constant for t, and remark 
that dividing (3 by some constant does not change the complexity. □ 


Note that since we can solve LWE for many secrets in essentially the same time than for one, we have the 
same property for BDD. 
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4.2 UniqueSVP 

The following reduction was given in [36] and works without modification. 

Theorem 9. Given a oracle, it is possible to solve UniqueSVP^'^^” in polynomial time of n and 

P. 

Proof. Let 2fi > p > /3 he a prime number, A and s such that ||As|| = Ai. If s = 0 mod p then A{s/p) is a 
non zero lattice vector, shorter than ||As||, which is impossible. Let i such that Si ^ 0 mod p, and 

= [ao,... ,ai_i,pai,ai_|_i,... ,a„_i], 

which generates a sublattice A' of A. If fc ^ 0 mod p, then kAs ^ A' so that Ai(A') > A 2 (A). 

If s'i = (so,..., [si/p],..., s„_i), then A'^s'i + (s^ mod p)ai = As and ||s'i||oo < B. Therefore, calling 
BDD to find the point of A' closest to (s^ mod p)a.i yields s'. By trying every value of i and (s^ mod p), np 
calls to are enough to solve UniqueSVPg'^”. □ 

The reductions for both BDO'I’II and UniqueSN/P'I'll work the same way. 


4.3 GapSVP 

The following reduction is a modification of the reduction given in [44], which comes from [23], but doesn’t 
fit to our context. 

First, we need a lemma proven in [23] : 

Lemma 12. The volume of the intersection of two balls of radius 1 divided by the volume of one ball is at 
least 

o o 

where d < 1 is the distance between the centers. 

Lemma 13. Given access to an oracle solving and let T) be an efficiently samplable distribution 

over points of IP whose infinity norm is smaller than b and for ^ < 1 and let 


e 


min Pr 

||s|U<flx~X> 


■ ^ V{x + s) 




Then, we can solve GapSVP'^''^”^ with negligible probability of failure using 0{dn/e/— d?) calls 

to the BDD oracle. 


Proof. Let K = 0{dn/e/f{l — A a basis of the lattice A. The algorithm consists in testing K times 

the oracle : sample x with law V and check if BDD(Ax + e) = x for e sampled uniformly within a ball 
centered at the origin and of radius 1/d. If any check is wrong, return that Ai < 1, else return Ai > /3/d. 

Clearly, the algorithm runs in the given complexity, and if Ai > /3/d, it is correct. 

So, let s yf 0 such that ||As|| < 1. Let x and e be sampled as in the algorithm. With probability over e 
greater than (1 — d^)"/^d/3, ||Ax + e — A(x + s)|| < 1/d. We condition on this event. 

Then, with probability at least e, 1/^ > fc = f and we also condition on this event. Let p be 

the probability that BDD(Ax + e) = x. The probability of failure is at least 


(1 -p)/(l + k) +pk/{l + k) > mm{k, 1)/(1 + k) > ^. 

Therefore, the probability of failure of the algorithm is 

We could have used the uniform distribution for V, but a discrete gaussian is more efficient. 


□ 
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Lemma 14. Let K he a one dimensional lattice. Then, 


Ex-.D.[||x|p]<^. 

Proof. Let /(x) = ||x|pp(x). Then, using a Poisson summation : 

/(A) ^ ExgA> (2^ - l|x|P)p(x) ^ ^ 

p(A) p{A*) - 2 -k' 


□ 


Lemma 15. LefD be such that'D{'x.) oc exp(—||x|p/(2cr^)) /or aZZx with ||x||oo < B. Then!) is polynomially 
samplable for B — R > 2a, and using the definitions of Lemma 13, we have for f, = exp(—ni?^/(2o’^) — 
2-/nRla), e > exp(—2nexp(—((i? — R)/a — l)^/2))/2. 


Proof. Using the Banaszczyk lemma , we have : 


St~itBexp(-xV(2a^)) ^ St~itBexp(-xV(2a^)) 

Ef=-Bexp(-a:2/(2a2)) " E.6zexp(-xV(2(T2)) 


> 1 — exp(—((_B — R)/a 


1)V2). 


So that : 

Pr [||x||oo < B - R]> exp(-2nexp(-((B - R)/a - 1)^/2)). 

x~'D 

Since the discrete Gaussian distribution over Z is polynomially samplable, this is also the case for T) since 

B — R> 2a. 

We now condition the distribution over ||x||oo < B — R. For some s such that ||s||oo < R and N = ||s||, 
we study the variable 


£ = 2a^ ln(exp(—||x + s||^/(2cr^))/exp(—||x||^/(2cr^))) = — 2(x,s). 


By symmetry : 

E[^] =E[Af2_2(a.^s)] 

Let be the distribution over the first coordinate and v = Sq. Then, using the previous lemma : 

Var[r;^ - xv] = - xv - v'^Y] < < a'^v'^. 

Summing over all coordinates, we have : 

Var[£] < 


By the Chebyshev inequality, we get : 


Pr[|£- 


■iV^I > ANa] < -. 


And the claim follows. 

Theorem 10. One can solve any GapSVP'' 


•11=0^_ 

(B^log log log /3/ log log /3) ,/3 


in time 


2{.nl2+o{n))/ ln(l+log P/\ogB) 


□ 


for (3 = p = a;(l), B>2. 

Proof. Use the previous lemma with a = il/v^TETogTog^ and Corollary 2 with /?' = P/log{P), so that it is 
sufficient to decode 2 °*^"/*°siog/3) points. □ 
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Theorem 11. If it is possible to solve in polynomial time, then it is possible to solve in randomized 

polynomial time GapSVP'^'''” . -. 

B /logn 

Proof. Use a = i?/-\/3 In n. □ 

We now proves the corresponding theorem in norm L2. 

Theorem 12. Let P be the uniform distribution over the points x of'IP with | |a;| | < B. Then, P is samplable 
in time 0{n^B^ log B) and for any s with ||s|| < ii and x sampled according to P, we have for n sufficiently 
large 

Pr[||a; + s|| < B]> - v^/2)/(B + v^/2)”. 

Proof. It is easy to see that a dynamic programming algorithm using 0{nB^) operations on integers smaller 
than i?" can determine the i-th lattice point of the ball, ordered lexicographically. Therefore, P is samplable. 

Let E be the right circular cylinder of center s/2, with generating segments parallel to s, of length R and 
radius r = \/B'^ — R?, and H the same cylinder with length R — \fnl2 and radius r — yjnj^. For the lattice 
point a; G Z”, let be the axis-aligned cube of length 1 centered on x. Let F be the union of all F^, such 
that F,^ C E. We have |F;nZ"| = vol(F) and H CF. Therefore, |F;nZ"| > Y^_^{r - ^I2Y-^{R- ^jT). 
Also, E is a subset of the intersection of the balls of radius B and centers 0 and s. Using Lemma 16 and 
Vn-i > Vn for n sufficiently large, the result follows. □ 

Corollary 3. One can solve GapSVP'' '' , - , B = wfl) and B = in time 

2{n/2+o{n))/ ln(l-|-log/3/ logB) 


Proof. Apply Theorem 6 with a reduction to BDD with parameter jB' = /3/log(/3) and B' = max(i3, log ,5). 
Then, apply the previous theorem with Lemma 13. □ 

Corollary 4. It is possible to solve any GapSVP'' '' .— with c > 0 in time 


Proof. Use the previous corollary with B = 2V^^°® ” log log n and (B = n'^. □ 

Theorem 13. If it is possible to solve BDD^''^“ in polynomial time, then it is possible to solve in randomized 
polynomial time GapSVP^^'^^” - - 

1 O In/ /— /t / /I 

BInj logn 


5 Other applications 

5.1 Low density subset-sum problem 

Definition 10. lUe are given a vector a G Z" whose coordinates are sampled independently and uniformly 
in [0;M), and (a,s) where the coordinates of s are sampled independently and uniformly in {0,1}. The goal 
is to find s. The density is defined as d = 

Note that this problem is trivially equivalent to the modular subset-sum problem, where we are given 
(a, s) mod M by trying all possible [(a, s)/M\ . 

In [31,16], Lagarias et al. reduce the subset sum problem to UniqueSVP, even though this problem was 
not defined at that time. We will show a reduction to which is essentially the same. First, we 

need two geometric lemmata. 

Lemma 16. LetBn{r), the number of points of of norm smaller than r, and U„ the volume of the unit 
ball. Then, 

Bn{r) <Vn[r+ . 
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Proof. For each x G Z”, let i?x be a cube of length 1 centered on x. Let E be the union of all the which 
have a non empty intersection with the ball of center 0 and radius r. Therefore vol{E) > Bn{r) and since E 
is included in the ball of center 0 and radius r + the claim is proven. □ 


Lemma 17. For n> A we have 


Vr,= 


7r"/2 

(n7^ 




Theorem 14. Using one eall to a oracle with any c < and d = o(l), and polynomial 

time, it is possible to solve a subset-sum problem of density d, with negligible probability of failure. 


Proof. With the matrix : 



for some C > and b = (1/2,..., 1/2, C(a,s)), return BDD(A,b). It is clear that ||As —b|| = 

\pril2. Now, let x such that ||Ax|| = Ai. If (a,x) ^ 0, then Ai = ||Ax|| > C therefore /3 > c2^/‘^. Else, 
(a,x) = 0. Without loss of generality, xq ^ 0, we let y = — (X)i>o and the probability over a that 

(a,x) = 0 is : 

M-l ^ 

Pr[(a,x) = 0] = Pr[ao = y] = X! = 2 ;]Pr[ao = z]< —. 

z^O 

Therefore, the probability of failure is at most, for sufficiently large n, 

BniPVnf2)fM <(vW^)”(c2^/‘'v^/2 + Vri/2)"/2"/‘^ 

= (y7re72(c + 2-^/^))” = 2-^(”). □ 


Corollary 5. For any d = o(l) and d = uj(logn/n), we can solve the subset-sum problem of density d with 
negligible probability of failure in time 2 ("/ 2 +°("’))/'n(i/d)^ 

The cryptosystem of Lyubashevsky et al. [37] uses > lOnlog^n and is therefore broken in time 
2 (in 2 / 2 +o(i))n/iogiogra^ Current lattice reduction algorithms are slower than this one when d = a;(l/(log nloglog n)). 


5.2 Sample Expander and application to LWE with binary errors 

Definition 11. Let q be a prime number. The problem Small-DecisionLWE is to distinguish (A,b) with A 
sampled uniformly with n columns and m rows, b = As + e such that ||s|p + ||e|p < and ||s|| < -,/nB 
from (A,b) sampled uniformly. Also, the distribution (s,e) is efficiently samplable. 

The problem Small-SearchLWE is to find s given (A,b) with A sampled uniformly and b = As + e with 
the same conditions on s and e. 

These problems are generalizations of Binary LWE where s and e have coordinates sampled uniformly in 
{0,1}. In this case, remark that each sample is a root of a known quadratic polynomial in the coordinates of 
s. Therefore, it is easy to solve this problem when m > n^. For m = 0{n), a Grobner basis algorithm applied 
on this system will (heuristically) have a complexity of 2^^") [3]. For m = 0{n/ logn) and q = it has 

been shown to be harder than a lattice problem in dimension 0(n/logn) [40]. 

We will first prove the following theorem ®, with the coordinates of x and y distributed according to a 
samplable V : 

Theorem 15. Assume there is an efficient distinguisher which uses k samples for Decision-LWE (respectively 
a solver for Search-LWE/ with error distribution (s,y) + (e,x) of advantage (resp. success probability) e. 

Then, either there is an efficient distinguisher for Decision-LWE with samples and secret taken uniformly, 
and error distribution F in dimension m—1 and with n-\-m samples of advantage — q~'^ — q~™'; or there 
is an efficient distinguisher of advantage e — f /or Small-Decision-LWE (resp. solver of success probability e — f 
for Small-Search-LWE/. 

* In [19], the authors gave a short justification of a similar claim which is far from proven. 
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Sample Expander The reduction is a generalization and impovement of Bottling’s reduction [17], which 
was made for LPN. 


Lemma 18 (Hybrid lemma [22]). We are given a distinguisher between the distributions Vq and T>i of 
advantage e whieh needs m samples. Then, there exists a distinguisher between the distributions Vq and Vi 
of advantage e/m whieh needs one sample, m samples of distribution Vq and m samples of distribution Vi. 

Proof. The distinguisher samples an integer i uniformly between 0 and to — 1. It then returns the output of 
the oracle, whose input is i samples of Vo, the given sample, and to — * — 1 samples of Vi. 

Let Pi the probability that the oracle outputs 1 if its input is given by i samples of Vq followed by to — i 
samples of Vi. Then, if the sample comes from Vq, the probability of outputting 1 is ^ TllLiPi- Else, it is 
m Pi- Therefore, the advantage of our algorithm is ^\pm - Po\ = :^■ □ 

In the following, we will let, for V some samplable distribution over 'L/efL, x be sampled according to 
V™-, y be sampled according to H”, z according to (ejs), u be sampled uniformly in i/L/qL)'^, v be sampled 
uniformly in 'LjqL, w in (Z/qZ)". 

Definition 12. The problem Knapsack-LWE is to distinguish between (G,c) where G is always sampled 
uniformly in and c is either uniform, or sampled according to GH'""'"". 

The problem Extended-LWE is to distinguish between (A,‘xA + y,z, ((xjy),z)) and (A, w,z, ((xjy),z)). 


The problem First-is-errorless-LWE is to distinguish between (A,‘xA + y, u, (x, u)) and (A, 
The following lemma comes from [39], we give a sketch of their proof. 


W, U, (X 


u)). 


Lemma 19. The problems Knapsack-LWE and Decision-LWE in dimension to with n + m samples where are 
equivalent: there are reductions in both ways which reduce the advantage by q~^~^. 

Proof. Given the Decision-LWE problem (A,b), we sample a uniform basis G of the left kernel of A and 
outputs (G, Gb). If b = As + e, we have therefore Gb distributed as Ge. 

Given the Knapsack-LWE problem (G,c), we sample a uniform basis A of the right kernel of A, and a 
uniform b such that Gb = c and outputs (A,b). If c = Ge, then b is distributed as As + e where s is 
uniform. 

Both reductions map the distributions to their counterparts, except when A or G are not full rank, which 
happens with probability at most hence the result. □ 

The lemma is a slight modification of the lemma given in [6]. 

Lemma 20. Given a distinguisher of advantage e for ExtendedLWE, there is a distinguisher of advantage 
g(i-i/g)-2i; — Decision-LWE in dimension to, uniform secret, with n + m uniform samples of error dis¬ 


tribution V. 


Proof. Using the previous lemma, we can assume that we want to solve a KnapsackLWE problem. Let (G,c) 
be its input, with c = Ge. We start by sampling z, t uniformly over (Z/gZ)™ and e' according to 2 ?"+'". 

We then compute G' = G —t*z and c' = c — (z, e')t. Remark that G' is uniform and c' — Ge — (z, e')t = 
G'e -f (z,e — e')t. Therefore, if (z,e') = (z,e), which happens with probability at least {G',c') comes 
from the same distribution as (G,c). Also, (z,e) is known. Else, since t is uniform, (G',c') is uniform. 

We then perform the previous reduction to a Decision-LWE problem, where (z,e) is known. Finally, we 
use Theorem 1 to reduce to Extended-LWE. If it doesn’t output n samples, which happens with probability at 
most 1/q, our distinguisher returns a uniform boolean. Else, remark that we have (xjy) = e so that ((xjy), z) 
is known and we use the given distinguisher. □ 

Lemma 21. Assume there is a distinguisher of advantage e for First-is-errorless-LWE. Then, there is a dis¬ 
tinguisher of advantage at least e(l — 1/q) — < 7 “"* and Decision-LWE in dimension to — 1 with uniform secret 
and n + m uniform samples of error distribution V. 
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Proof. We take a Decision-LWE problem in dimension m — 1 with n + m samples, and extend it of one 
coordinate, the corresponding secret coordinate being sampled uniformly. We then switch the error and the 
secret, as in Theorem 1, and return a random boolean if the reduction do not output n samples. 

We have the errorless sample {em-i,Sm-i) with e^-i = (0, ...,0,1). If we feed this sample to our 
reduction, it outputs an errorless sample (u, (x,u)), and the vectorial part is distributed as a column of a 
uniform invertible matrix. Therefore, the statistical distance between the distribution of u and the uniform 
distribution is at most q~'^. □ 

Theorem 16. Assume there is an efficient distinguisher which uses k samples for Decision-LWE (respectively 
a solver for Search-LWEJ with error distribution (s,y) + (e,x) of advantage (resp. success probability) e. 

Then, either there is an efficient distinguisher for Decision-LWE with samples and secret taken uniformly, 
and error distribution T) in dimension m—1 and with n + m samples of advantage — q~™'; or there 

is an efficient distinguisher of advantage e —^ /or Small-Decision-LWE (resp. solver of success probability e — ( 
for Small-Search-LWE/ 

Proof. Let 

— Vq be the distribution given by (w, (w,s) + (s,y) + (e, x)) 

— Vi be the distribution given by (*xA + y, (x, b)) 

— I ?2 be the distribution given by (*xA + y, (x, u)) 

— I ?3 be the distribution given by (w,r). 

Let Ci be the advantage of D between Vi and Hi+i. 

Let (A, c, (e|s), ((x|y), (e|s))) be an instance of the Extended-LWE problem. We compute b = As —e, and 
use the hybrid lemma with distributions Vq and Vi, and sample (c, (c,s) -L ((x|y), (e|s))). If c =* xA + y, 

we have (c, s) = (x,b) + (x,e) + (y, s) so that the sample is distributed according to Vi. Therefore, we have 

a distinguisher for Extended-LWE of advantage 

Clearly, there is a distinguisher of advantage ei against Small-Decision-LWE. 

Let (A, c, u, r) be an instance of the First-is-errorless-LWE problem. The hybrid lemma with distributions 
V 2 and V 3 , and sample (c,r) shows that there exist an efficient distinguisher for First-is-errorless-LWE with 
advantage e^/k. 

Since co + ci +£2 > e, using the previous lemmata, and the fact that Decision-LWE is harder in dimension 
m than m — 1, the theorem follows. 

For the search version, the indistinguishability of Vq and Vi is sufficient. □ 

Dottling’s reduction has a uniform secret, so that a problem solvable in time is transformed into 

a problem where the best algorithm takes time (Jq not have such dramatic loss here. 

Applications 

Lemma 22. Let V = ZJz.o- for cr > 1. 

Then, the advantage of a distinguisher for Decision-LWE of dimension m with m + n samples of noise 
distribution V is at most g"/(T"+™. Furthermore, the bias of ((s|e), (x|y)), for fixed s and e, is at least 
exp(-7r(||s|p -F ||e|p)(T^/q^). 

Proof. We have I?"*+”(a) < = l/pcr(Z)™+” and P(t(Z) = apif,j{'Z) > a using a Poisson summation. 

The first property is then a direct application of the leftover hash lemma, since q is prime. 

The bias of XV can be computed using a Poisson summation as : 

Pcr{a) cos(27rAa/(?) = Pi/aC^ + X/q) > exp(-7rA^(T^/g^). 


Therefore, the second property follows from the independency of the coordinates of x and y. 


□ 
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Corollary 6. Let q, n and m such that (m —3) log q/(ji+m) —log k = u;(l) and mlog q/(n+m) = o{n/logn). 
Then, we can solve Small-Decision-LWE in time 

2(n/2+o(n))/ ln(l + ((m-3) log (j/(n+m)-log fe)/ logS) 

with negligible probability of failure. 

Proof. We use the previous lemma with a = so that we have {3 = 

The algorithm from Theorem 6 needs samples, so that the advantage of the potential distinguisher for 
Decision-LWE is /q for f — 2“"/'^ ; while the previous lemma proves it is less than □ 

The NTRU cryptosystem [27] is based on the hardness of finding two polynomials / and g whose coeffi¬ 
cients are bounded by 1 given h = f/g mod (X" — l,q). Since hg = 0 with an error bounded by 1, we can 
apply previous algorithms in this section to heuristically recover / and g in time This is 

the first subexponential time algorithm for this problem since it was introduced back in 1998. 

Corollary 7. Assume we have a Search-LWE problem with at least nlogg-l- r samples and Gaussian noise 
with a = and q = n'^. Then, we can solve it in time for any failure probability in 

2-"°''’ +q~G 

Proof. First, apply a secret-error switching (Theorem 1) and assume we lose at most r samples. Apply the 
previous corollary with B = 7 jd-c-i-o(i) jg ^ correct bound for the secret, except with probability 2“"°^^'. 

Lemma 11 shows that < logger^, except with probability so that f3 = We can then use 

a — 0(1) and apply Theorem 4. □ 

Note that this corollary can in fact be applied to a very large class of distributions, and in particular 
to the learning with rounding problem, while the distortion parameter is too large for a direct application 
of Theorem 4. 

Also, if the reduction gives a fast (subexponential) algorithm, one may use a = 2y/n and assume that 
there is no quantum algorithm solving the corresponding lattice problem in dimension m. 

Even more heuristically, one can choose a to be the lowest such that if the reduction does not work, we 
have an algorithm faster than the best known algorithm for the same problem. 
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A LWE in Small Dimension 

If the dimension n is small (n = (!l(log(/3))), and in particular if n is one we can use the same kind of 

algorithm. Such algorithms have been described in [42,8] and an unpublished paper of Bleichenbacher. 

® This is an interesting case in cryptography for analyzing the security of DSA[42] or proving the hardness of 
recovering the most significant bits of a DifSe-Hellman. 
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The principle is to reduce at each step some coordinates and partially reduce another one. If q is too 
large, then the reduction of the last coordinate has to be changed: we want to have this coordinate uniformly 
distributed in the ball of dimension 1 and radius R (i.e. [—i?; i?]). To this end, for each sample, if the ball of 
radius R' centered at the sample is included in the ball of radius R and if it contains at least another sample, 
we will choose uniformly a sample in the ball and then remove it and add the difference in the output 
list. In the other case, we add the element to our structure. The last coordinate of the outputted samples is 
clearly uniform over a ball of radius R'. 

Finally, we apply a Fourier transform to the samples : for various potential secrets x, if we denote by 
(a, b) the samples where a is the last coordinate of the vectorial part and b the scalar part, we compute the 
experimental bias 3 (?(E[exp( 2 z 7 r(aa: — b)/q)]) and output the s which maximizes them. 

We can analyze the Fourier stage of the algorithm with a uniformly distributed in a ball of radius R, an 
odd integer. The bias for x is E[exp( 2 z 7 r(e + (s — a;)a)/g)] where e = as — b is the error, which is equal to the 
bias of the error times E[exp( 2 i 7 r(s — x)a/q)\. We take x regularly distributed so that the interval between 
two x is smaller than g/i?/2, and we can compute all bias in one Fourier transform. 

Let us consider an x such that |s — a;| < q/R/i. The introduced bias is the bias of the error times 
> which is lower bounded by a universal constant. However, for |s — xj > q/R, the introduced 
bias is upper bounded by a constant, strictly smaller than the previous one. Therefore, the Fourier stage 
determines a x such that |s — x| < q/R. 

We will in a first time determine the secret up to an approximation factor q/R using a fast Fourier 
transform on « 2R points. We are therefore reduced to a smaller instance where the secret is bounded by 
q/R and we can choose R' = R? for instance. Using one more time a Fourier Transform on 2R'/R = 2R 
points, we can recover the secret up to an approximation factor of q/R' = q/R/^■ We can continue so on, and 
we get the final coordinate of the secret and repeating this process n — 1 times, we have the whole secret. 

We define /? = i/log q/a and express all variables as functions of q. 

Theorem 17. We can solve Search-LWE with n = 1, j5 = a;(l), /3 = g°F/iog</) distortion parameter 
e < in time 

^(l/2+o(l))/log/3 

with failure probability 

Proof. We analyze here the first iteration of the algorithm. 

We set k = [log(/3^/log/3/3)J the number of iterations, m = 0(logg2^g“) the number of input samples 
and Ri = g/i?* such that the vectorial part of the input of the fth reduction is sampled uniformly in a 
ball of the largest odd radius smaller than Ri. If we have N samples, the Hoeffding inequality shows that 
one reduction step outputs at least A^(l/2 — 1/R — logg/i?^) — R samples, except with probability 
Therefore, we can use R — q^. For simplicity, we then set Rk = 3 so that R < g^/“. The bias of the error is 
at least exp(—2*^a^)(l — e3^) > exp(—logg/log/I/3)/2 so that taking x = 1/k works. 

The overall complexity is C>(log^ g/S^gV*) which is g(i/ 2 +o(i))/ iog/ 3 ^ □ 

B Solving LWE with Lattice Reduction 

Lattice reduction consists in finding a basis of a lattice with short vectors. The best algorithm [20] uses a 
polynomial number of times an oracle which returns the shortest non-zero vector of a lattice of dimension 
d, and finds a vector whose norm is inferior to ( 7 ^ -I- for a lattice of volume V and 

dimension n, with 7 ^ < 2B^J'^ the square of the Hermite constant. 

Heuristically, we use the BKZ algorithm which is stopped after a polynomial number of calls to the 
same oracle. The i-th vector of the orthonormalized basis has a length proportional to roughly 7 ® and the 
Gaussian heuristic gives Bd = 7 ‘^G+i)/^ so that 7 « and the first vector has a norm « ry-nl 2 yiln ^ 

We can do it in logarithmic time using a binary balanced tree which contains in each node the size of the subtree. 
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The oracle takes time with c = 2 for a deterministic algorithm[41], c = 1 for a randomized 

algorithm[2], c = 0.2972 heuristically[29] and c = 0.286 heuristically and quantumly[30]. 

B.l Primal algorithm 

Given A and b = As + e + qy, with the coordinates of s and e being Gaussians of standard deviation 
qa = o{q), we use the lattice generated by 



We then heuristically assume that the i-th vector of the orthonormalized basis has a length at least 
^-{n+m)/ 2 ldyi/{n+m) ^ oi'thogonalized basis, the error vector has also independent gaussian coordinates 

of standard deviation aq. Therefore, Babai algorithm will work with high probability if > 

aqy/log n. 

Thus, we need to maximize /'^1‘^q'^/ ("+™)^ go we use m+n = y^2dnlog qj logd and the maximum 

is 

2 “ V" log 9 d/ 2 /d 2 log 9(1 - log d/ 2 /d/ log g) _ ^2 “ •s/^nlog'qlogllTd 

If a = n““ and q = , we need alogn = 1^26/1 lognlogd/d and we deduce d/n = 2&/a^ + o(l), so that the 

running time is 2 ( 2 ob/a^+o(i))n^ 

When we have a binary secret and error, since only the size of the vectors counts, it is as if qa = 0(1) and 
the complexity is therefore 2 (^o/fc+o(i))n^ When the secret is binary and the error is a Gaussian, the problem 
is clearly even more difficult 

In dimension 1, this gives log 1/a = logglog d/d so that d = (2 + o(l)) log glog(logg/ log^ a)/log^ a, 
which gives a polynomial time algorithm for log^ a = ^^(logqlogloglogg/loglogg) and a subexponential 
one for log^ a = w (log log g). 

We can use the same algorithm for subset-sum with density d with the lattice defined in subsection 5.1 : 
applying the same heuristics gives a complexity of 


B.2 Dual algorithm 


The attack consists in using lattice reduction to find sums of samples which are equal to zero on most 
coordinates, and then use FindSecret to find the secret on the remaining coordinates. It has been described 
for instance in [42]. 

More precisely, we select n truncated samples which form an invertible matrix A of size n; and m others 
which form the matrix B with m columns. We then search for a short vector v yf 0 of the lattice generated 

We have v = f ) and A(—x) -h By = A(—x -b A~^By) = AO = 0 mod q so we deduce from v a small 


sum of samples which is equal to zero on n coordinates. We can then deduce a sample of low dimension with 
bias As ||v|| is inferior to « qn/{n+m) 2 {n+m)iogd/d /2 ^ gp select n + m = ^J2ndlogq/log d and 

||y-|| 2 ~ logqlogd/d 

Finally, using a = n““, q = and d = 0(n), we search d so that 2 V®"log'jiogd/d _ 0 M/q, 2 ^ gQ 
d/n = 86/(1+ 2 a)^ + 0 ( 1 ). 


In this case, a standard technique is to scale the coordinates of the lattice corresponding to e, in order to have 
each coordinate of the distance to the lattice of the same average size. 
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When the errors are binary, we can apply the same formula with a = 0{l/q) and the complexity is 
therefore 2 ( 2 c&/(i/ 2 + 6 )"+o(i))n^ 

Observe that this algorithm is always asymptotically faster than the previous one, but requires an expo¬ 
nential number of samples. This disadvantage can be mitigated by finding many short vectors in the same 
lattice, which hopefully will not affect the behaviour of FindSecret. Another solution is to use a sample 
preserving search-to-decision reduction [39] so that the product of the time and the square of the success 
probability is the same. 

In dimension 1, this gives log(d/a^) = ^/8logqlogd/d and we deduce 

d = (8 -f o(l)) log q log (log q/ log^(d/a^))/ log^(d/a^) 

and this is a polynomial time algorithm for log^ a = 17 (log q log log log q/ log log q), but a subexponential one 
for any log{q)/a^ = 


B.3 Graphical Comparison between lattice algorithms and BKW 


Asymptotical complexity of solving LWE 



The horizontal axis represents the standard deviation of the noise, with x such that a = . The vertical 

axis represents the modulus, with y such that q = 71^. 

In the upper right quadrant delimited by the black line, parameters are such that the dual attack with 
heuristic classical SVP is more efficient than BKW. Below the black lines, Grobner basis algorithms are 
sub-exponential. Above the red line, there is a quantum reduction with hard lattice problems [46]. 

The rainbow-colored lines are contour lines for the complexity of the best algorithm among BKW and 
the dual attack. Each curve follows the choice of parameters such that the overall complexity is 
where k varies from 0 to 5 by increments of 0.3. 
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Asymptotical complexity of soiving LWE with n log q samples 



C LPN 

Definition 13. A LPN distribution is defined as a LWE distribution with q = 2. The error follows a Bernoulli 
distribution of parameter p. 

Theorem 18. If we have p = 1/2 — and p > 1/n, then we solve Decision-LPN in time 

2 (n+o(n))/ log(ra/ log(l-2p)) 


with negligible probability of failure. 

Proof. We use di = min(i [nxj, n), m = 712^+"^ and select k = [log(/3^/log/3/3)J where = n/2/log(l — 
2p) = a;(l). The algorithm works if dk = n, so we chose x = 1/k+l/n = (1 + o(l))/log(/3^). Finally, the bias 
of the samples given to Distinguish is < /3^/log/3/3n/2//l^ = n/6/log/3 < nx/3 so that Distinguish 
works with negligible probability of failure. □ 

In particular, for p = 1/2 — 2“”°''^', we have a complexity of 

Theorem 19. If we have p < 1/4, then we solve Search-LPN in time — p)~”) with negligible proba¬ 

bility of failure. 

Proof. We repeat n(l — p)~" times the following algorithm : apply the secret-error switching algorithm 
(Theorem 1) over 34n new samples, and run Distinguish over the scalar part of the 33n samples. If the 
distinguisher outputs that it is a LPN distribution, then the secret is the zero vector with failure probability 
2-33/32 ’t.^ and we can recover the original secret from this data. If the new secret is not zero, then the 
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distinguisher outputs that the distribution is uniform, except with probability 2 33/32n^ probability 
that no new secret is equal to zero is 

<exp(-(l-p)”n(l-p)-”) = 

Using the union bound, the overall probability of failure is at most □ 

In particular, for p = the complexity is ©(n"* exp(Y^)). In fact, for p = (1 + o(I))/ln(n) < 1/4, we 

have a complexity which is ©(n"* exp(n(p + p^))) = 2 d+°U))"/log"^ go that the threshold for p between the 
two algorithms is in (1 + o(l))/lnn. 
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